Skip to content

gh-113993: Allow interned strings to be mortal, and fix related issues#120520

Merged
encukou merged 79 commits intopython:mainfrom
encukou:immortal-interned
Jun 21, 2024
Merged

gh-113993: Allow interned strings to be mortal, and fix related issues#120520
encukou merged 79 commits intopython:mainfrom
encukou:immortal-interned

Conversation

@encukou
Copy link
Member

@encukou encukou commented Jun 14, 2024

I've spent too much time looking at this myself, it wants more eyes :)

I spent a week learning about the string interning mechanism, and wrote up how I think it should work in an InternalDocs file I'm adding here.

I found a bunch of ... quirks if not outright bugs. For example, we have duplicate singletons (e.g. _Py_ID(a) and the latin1 short string a). I don't think I can bring back mortal interned strings without getting my idea of the design in sync with the code, so, this ended up being a big PR.


  • Add an InternalDocs file describing how interning should work and how to use it.
    (Please review this first!)

  • Add internal functions to explicitly request what kind of interning is done:

    • _PyUnicode_InternMortal
    • _PyUnicode_InternImmortal
    • _PyUnicode_InternStatic
  • Switch uses of PyUnicode_InternInPlace to those.

  • Disallow using _Py_SetImmortal on strings directly.
    You should use _PyUnicode_InternImmortal instead:

    • Strings should be interned before immortalization, otherwise you're possibly
      interning a immortalizing copy.
    • _Py_SetImmortal doesn't handle the SSTATE_INTERNED_MORTAL to
      SSTATE_INTERNED_IMMORTAL update, and those flags can't be changed in
      backports, as they are now part of public API and version-specific ABI.
  • Add private _only_immortal argument for sys.XXX, used in refleak test machinery.

  • Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:

    • _Py_ID
    • _Py_STR (including the empty string)
    • one-character latin-1 singletons

    Now, when you intern a singleton, that exact singleton will be interned.

  • Add a _Py_LATIN1_CHR macro, use it instead of _Py_ID/_Py_STR for one-character latin-1 singletons everywhere (including Clinic).

  • Intern _Py_STR singletons at startup.

    Try this in 3.12: (click to expand)
    import sys
    
    a = sys.intern('<module>')  # normal string
    print('a', id(a), sys.getrefcount(a))
    try:
        raise Exception()
    except Exception as err:
        b = err.__traceback__.tb_frame.f_code.co_name  # same string via _Py_STR
    
    assert sys.intern(a) is sys.intern(b)

    In 3.13 the reproducer doesn't work but I don't think the underlying unsoundness was fixed.

  • For free-threaded builds, intern _Py_LATIN1_CHR singletons at startup.

  • Beef up the tests. Cover internal details (marked with @cpython_only).

  • Add lots of assertions

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants